# balsubramani-freund-uai-2024

This repo contains code for Convergence Behavior of an Adversarial Weak Supervision Method

If you use this in an academic study, please cite the paper:
```
TBA
```

This is a modified version of the repository found [here](https://github.com/stevenan5/wrench) which itself is a fork of the [wrench](https://github.com/JieyuZ2/wrench) repository that focuses on aggregating the provided labeling functions for certain datasets.
Notably, two label models are added, BF (called WMRC) is our own implementation and [AMCL\_CC](https://github.com/BatsResearch/amcl/tree/main) is ported, the CVXPY implementation in specific.
Note that when WMRC appears in the code, it is actually the Balsubramani-Freund model.
Similarly, Snorkel actually refers to Data Programming.

Scripts have been written to automatically run and record the results (0-1 Loss, Brier Score, Log Loss) of multiple methods on select datasets (more than once if the method has randomness).
Further, code for the consistency experiment and plotting has also been provided.
Code that creates a latex table based on the results of the methods on the datasets has also been included.

The results shown in the paper are all here.
Figures are in the respective dataset folders while the text files in the `results` folder contain the tables of numerical results.

We now give a more detailed explanation of how to rerun the methods and to generate the results seen in the paper.
Also below are installation instructions.

## Installation
1. create and activate conda environment WITHOUT using the `environment.yml` file

    `conda create --name wrench python=3.6`

    `conda activate wrench`
2. install wrench

    `pip install ws-benchmark==1.1.2rc0`
3. install other dependencies (ignore the warning about numba from pip)

    `pip install -r requirements.txt`

## Running the methods
We have provided files `run_(wmrc|amcl_cc|mv|ds|snorkel|ebcc|hyperlm).py` that will automatically run the named method on the following datasets:

- Animals with Attributes
- Basketball
- Breast Cancer
- Cardiotocography
- DomainNet
- IMDB
- OBS
- SMS
- Yelp
- Youtube

WMRC (BF) will require another file to be run to write the configurations. See below.
Majority Vote (MV), one-coin Dawid Skene (OCDS), and Hyper Label Model (HyperLM) only run once because they are deterministic.  The other methods, WMRC, Snorkel (DP), EBCC, AMCL\_CC all run 10 times on each dataset by default.
The 0-1 Loss, Brier Score, and Log Loss are all recorded, along with other information in the `*.mat` file.
A `results` folder is automatically created along with a folder for each dataset.
If it already exists, the results will be overwritten!
For each dataset, a folder will be created for each method.
That is where the `*.log` and `*.mat` files can be found.
Below is extra information about changeable settings/other instructions to run the method.

### WMRC (BF)

WMRC relies on many hyperparameters, which have already been set in `write_wmrc_settings.py`.

1. Write the settings for all datasets.

    `python3 write_wmrc_settings.py`
2. Run BF.

    `python3 run_wmrc.py`

### EBCC, AMCL\_CC, MV, Snorkel, HyperLM
Running the respective `python3 run_(ebcc|amcl_cc|mv|snorkel|hyperlm).py` suffices.

## BF Consistency on Synthetic Data
To run the experiment demonstrating BF's consistency on data generated by the one-coin DS assumption, one can do the following.

0. (Optional) Generate new data.

    `python3 generate_DS_synthetic.py`
1. Run BF.

    `python3 run_bf_synthetic_consistency.py`

## Result Visualizations and Table Generation
We have also included some code to visualize the predictions of each method (for each dataset in their respective folder) along with code that automatically generates latex tables containing all results (in `results` folder).

- BF Epistemic Error Breakdown (with comparison to DS).

    These plots show the sources of epistemic error for BF.
    They also show relevant quantities of epistemic error for one-coin DS.
    One can also choose to include the total one-coin DS epistemic error (set `include_ds=True`), but that often obscures the other parts of the figure.

    `python3 plot_bf_error_breakdown.py`

- BF Consistency Visualization

    This shows the distance between the BF prediction and the underlying label distribution in a log-log scale graph.

    `python3 plot_bf_synthetic_consistency.py`

- Result t-test and Aggregation

    Performs a two sided t-test on the error rates (each of the three losses).
    Also aggregates data from results for each dataset. 
    This must be run before making the loss tables (below).

    `python3 result_t_test.py`

- Table Generation

    Lastly, we provide a script to generate latex tables of Log loss, 0-1 loss/Brier Score (these are combined).
  Note that you must use latextable version 1.0.0 to get multicolumn tables.
  That is needed to create the 0-1 Loss/Brier Score/.
  However, Python 3.6 (which is required for the `wrench` package) is too old for that version, so one must use a newer version of Python, 3.9+ should work.

    `conda deactivate wrench`

    `pip install latextable==1.0.0`

    `python3 make_loss_tables.py`


## Miscellaneous Information
We store a simplified version of `wrench` datasets as `.mat` files.
They contain train, validation, and possibly test labels and labeling function predictions.

## Acknowledgements
We thank Verónica Álvarez for curating and processing the datasets into the `.mat files found here.
